NeuralDPS: Neural Deterministic Plus Stochastic Model with Multiband Excitation for Noise-Controllable Waveform Generation
نویسندگان
چکیده
The traditional vocoders have the advantages of high synthesis efficiency, strong interpretability, and speech editability, while neural advantage quality. To combine two vocoders, inspired by deterministic plus stochastic model, this paper proposes a novel vocoder named NeuralDPS which can retain quality acquire efficiency noise controllability. Firstly, framework contains four modules: source module, V/UV decision module filter module. input required is just spectral parameter, avoids error caused estimating additional parameters, such as F0. Secondly, to solve problem that different frequency bands may proportions components components, multiband excitation strategy used generate more accurate signal reduce filter's burden. Thirdly, method control proposed. In way, signal-to-noise ratio (SNR) be adjusted easily. Objective subjective experimental results show our proposed obtain similar performance with WaveNet it generates waveforms at least 280 times faster than vocoder. It also 28% WaveGAN's on single CPU core. We verified through experiments effectively in predicted adjust SNR speech. Examples generated found https://hairuo55.github.io/NeuralDPS.
منابع مشابه
Efficient Speech Synthesis System using the Deterministic plus Stochastic Model
In this paper, a high-quality concatenative synthesis system using the deterministic plus stochastic model of speech is described, in which the prosodic modifications are performed by means of very simple and efficient operations, as we reported in a previous work [11]. In particular, pitchsynchrony is not necessary, and linear interpolations substitute other types of estimation. The method for...
متن کاملA hybrid stochastic-deterministic optimization method for waveform inversion
Present-day high quality 3D acquisition can give us lower frequencies and longer offsets with which to invert. However, the computational costs involved in handling this data explosion are tremendous. Therefore, recent developments in full-waveform inversion have been geared towards reducing the computational costs involved. A key aspect of several approaches that have been proposed is a dramat...
متن کاملModified multiband excitation model at 2400 bps
This paper presents the Modi ed Multiband Excitation Model used for speech coding. In many MBE model coders, speech quality is degraded when incorrect voicing decisions are made, particularly for high-pitched female speakers. The MMBE addresses this issue with a modi ed voiced/unvoiced decision algorithm and a more robust pitch estimate. The listening quality of speech produced using the MMBE m...
متن کاملVerhulst model with Lévy white noise excitation
The nonlinear stochastic systems with noise excitation have attracted extensive attention and the concept of noise-induced transitions has got a wide variety of applications in physics, chemistry, and biology (1). Noise-induced transitions are conventionally defined in terms of changes in the number of extrema in the probability distribution of a system variable and may depend both quantitative...
متن کاملInverse Filtering Based Harmonic Plus Noise Excitation Model for HMM-Based Speech Synthesis
In this paper, a new Voicing Cut-Off Frequency (VCO) estimation method based on inverse filtering is presented. The spectrum of residual signal got from inverse filtering is split into sub-bands which are clustered into two classes by using K-means algorithm. And then, the Viterbi algorithm is used to search a smoothed VCO contour. Based on this new VCO estimation method, an adaptation of Harmo...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
ژورنال
عنوان ژورنال: IEEE/ACM transactions on audio, speech, and language processing
سال: 2022
ISSN: ['2329-9304', '2329-9290']
DOI: https://doi.org/10.1109/taslp.2022.3140480